Search for: All records

Editors contains: "Vedaldi, A."

« Prev Next »

Total Resources

7

Resource Type
Conference Paper

7

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

7

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Attentive Normalization

https://doi.org/10.1007/978-3-030-58520-4_5

Li, X. ; Sun, W. ; Wu, T. ( August 2020 , 16th European Conference on Computer Vision (2020))
Vedaldi, A. (Ed.)
In state-of-the-art deep neural networks, both feature normalization and feature attention have become ubiquitous. They are usually studied as separate modules, however. In this paper, we propose a light-weight integration between the two schema and present Attentive Normalization (AN). Instead of learning a single affine transformation, AN learns a mixture of affine transformations and utilizes their weighted-sum as the final affine transformation applied to re-calibrate features in an instance-specific way. The weights are learned by leveraging channel-wise feature attention. In experiments, we test the proposed AN using four representative neural architectures. In the ImageNet-1000 classification benchmark and the MS-COCO 2017 object detection and instance segmentation benchmark. AN obtains consistent performance improvement for different neural architectures in both benchmarks with absolute increase of top-1 accuracy in ImageNet-1000 between 0.5\% and 2.7\%, and absolute increase up to 1.8\% and 2.2\% for bounding box and mask AP in MS-COCO respectively. We observe that the proposed AN provides a strong alternative to the widely used Squeeze-and-Excitation (SE) module. The source codes are publicly available at \href{https://github.com/iVMCL/AOGNet-v2}{the ImageNet Classification Repo} and \href{https://github.com/iVMCL/AttentiveNorm\_Detection}{the MS-COCO Detection and Segmentation Repo}.
more » « less
Full Text Available
OnlineAugment: Online Data Augmentation with Less Domain Knowledge

https://doi.org/10.1007/978-3-030-58571-6_19

Tang Z., Gao Y. ( November 2020 , European Conference on Computer Vision (ECCV))
Vedaldi A., Bischof H. (Ed.)
Full Text Available
Contrastive Learning for Unpaired Image-to-Image Translation

https://doi.org/10.1007/978-3-030-58545-7_19

Park, Taesung ; Efros, Alexei ; Zhang, Richard ; Zhu, Jun-Yan ( October 2020 , Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science,)
Vedaldi A., Bischof H. (Ed.)
Full Text Available
RANSAC-Flow: Generic Two-Stage Image Alignment

https://doi.org/10.1007/978-3-030-58548-8_36

Shen, Xi ; Darmon, François ; Efros, Alexei ; Aubry, Mathieu ( October 2020 , Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science)
Vedaldi A., Bischof H. (Ed.)
Full Text Available
Action Localization Through Continual Predictive Learning

https://doi.org/10.1007/978-3-030-58568-6_18

Aakur, Sathyanarayanan ; Sarkar, Sudeep ( November 2020 , European Conference on Computer Vision)
Vedaldi, A. ; Bischof, H. ; Brox, T. ; Frahm, JM. (Ed.)
The problem of action localization involves locating the action in the video, both over time and spatially in the image. The current dominant approaches use supervised learning to solve this problem. They require large amounts of annotated training data, in the form of frame-level bounding box annotations around the region of interest. In this paper, we present a new approach based on continual learning that uses feature-level predictions for self-supervision. It does not require any training annotations in terms of frame-level bounding boxes. The approach is inspired by cognitive models of visual event perception that propose a prediction-based approach to event understanding. We use a stack of LSTMs coupled with a CNN encoder, along with novel attention mechanisms, to model the events in the video and use this model to predict high-level features for the future frames. The prediction errors are used to learn the parameters of the models continuously. This self-supervised framework is not complicated as other approaches but is very effective in learning robust visual representations for both labeling and localization. It should be noted that the approach outputs in a streaming fashion, requiring only a single pass through the video, making it amenable for real-time processing. We demonstrate this on three datasets - UCF Sports, JHMDB, and THUMOS’13 and show that the proposed approach outperforms weakly-supervised and unsupervised baselines and obtains competitive performance compared to fully supervised baselines. Finally, we show that the proposed framework can generalize to egocentric videos and achieve state-of-the-art results on the unsupervised gaze prediction task.
more » « less
Full Text Available
FreeCam3D: Snapshot Structured Light 3D with Freely-Moving Cameras

https://doi.org/10.1007/978-3-030-58583-9_19

Wu, Yicheng ; Boominathan, Vivek ; Zhao, Xiao ; Robinson, Jacob T. ; Kawasaki, Hiroshi ; Sankaranarayanan, Aswin C. ; Veeraraghavan, Ashok ( November 2020 , European Conference on Computer Vision)
Vedaldi, A. ; Bischof, H. ; Brox, T. ; Frahm, JM. (Ed.)
Full Text Available
3PointTM: Faster Measurement of High-Dimensional Transmission Matrices

https://doi.org/10.1007/978-3-030-58598-3_19

Chen, Yujun ; Sharma, Manoj Kumar ; Sabharwal, Ashutosh ; Veeraraghavan, Ashok ; Sankaranarayanan, Aswin C. ( November 2020 , European Conference on Computer Vision)
Vedaldi, A. ; Bischof, H. ; Brox, T. ; Frahm, JM. (Ed.)
Full Text Available